TINE: A Metric to Assess MT Adequacy

نویسندگان

  • Miguel Rios
  • Wilker Aziz
  • Lucia Specia
چکیده

We describe TINE, a new automatic evaluation metric for Machine Translation that aims at assessing segment-level adequacy. Lexical similarity and shallow-semantics are used as indicators of adequacy between machine and reference translations. The metric is based on the combination of a lexical matching component and an adequacy component. Lexical matching is performed comparing bagsof-words without any linguistic annotation. The adequacy component consists in: i) using ontologies to align predicates (verbs), ii) using semantic roles to align predicate arguments (core arguments and modifiers), and iii) matching predicate arguments using distributional semantics. TINE’s performance is comparable to that of previous metrics at segment level for several language pairs, with average Kendall’s tau correlation from 0.26 to 0.29. We show that the addition of the shallow-semantic component improves the performance of simple lexical matching strategies and metrics such as BLEU.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

E-rating Machine Translation

We describe our submissions to the WMT11 shared MT evaluation task: MTeRater and MTeRater-Plus. Both are machine-learned metrics that use features from e-rater R ©, an automated essay scoring engine designed to assess writing proficiency. Despite using only features from e-rater and without comparing to translations, MTeRater achieves a sentencelevel correlation with human rankings equivalent t...

متن کامل

MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles

We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail ...

متن کامل

Unsupervised vs. supervised weight estimation for semantic MT evaluation metrics

We present an unsupervised approach to estimate the appropriate degree of contribution of each semantic role type for semantic translation evaluation, yielding a semantic MT evaluation metric whose correlation with human adequacy judgments is comparable to that of recent supervised approaches but without the high cost of a human-ranked training corpus. Our new unsupervised estimation approach i...

متن کامل

A Customizable MT Evaluation Metric for Assessing Adequacy Machine Translation Term Project

This project describes a customizable MT evaluation metric that provides system-dependent scores for the purposes of tuning an MT system. The features presented focus on assessing adequacy over uency. Rather than simply examining features, this project frames the MT evaluation task as a classi cation question to determine whether a given sentence was produced by a human or a machine. Support Ve...

متن کامل

Semantic vs. Syntactic vs. N-gram Structure for Machine Translation Evaluation

We present results of an empirical study on evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to complete the semantic role annotation templates. Unlike the widely-used lexical and n-gram based or syntactic based MT evaluation metrics which are fluencyoriented, our results show that using semantic role labels to evaluate the ut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011